Combining Naive Bayes and Decision Tables
نویسندگان
چکیده
We investigate a simple semi-naive Bayesian ranking method that combines naive Bayes with induction of decision tables. Naive Bayes and decision tables can both be trained efficiently, and the same holds true for the combined semi-naive model. We show that the resulting ranker, compared to either component technique, frequently significantly increases AUC. For some datasets it significantly improves on both techniques. This is also the case when attribute selection is performed in naive Bayes and its semi-naive variant. Introduction Our combined model is a simple Bayesian network in which the decision table (DT) represents a conditional probability table. It can be viewed as a restricted version of Pazzani’s semi-naive Bayesian model (Pazzani 1996). The latter greedily joins attributes into multiple groups of dependent attributes—rather than just one group as the method considered here (represented by the DT). This can result in more powerful models, but also increases computational complexity by an order of magnitude. Another difference is that search and evaluation in this paper are based on AUC instead of accuracy. Learning the combined model A DT stores the input data in condensed form based on a selected set of attributes and uses it as a lookup table when making predictions. Each entry in the table is associated with class probability estimates based on observed frequencies. The key to learning a DT is to select a subset of highly discriminative attributes. The standard approach is to choose a set by maximizing cross-validated performance. Cross-validation is efficient for DTs as the structure does not change when instances are added or deleted, only the class counts associated with the entries change. Similarly, crossvalidation for naive Bayes (NB) is also efficient as frequency counts for discrete attributes can be updated in constant time. In our experiments we used forward selection to select attributes in stand-alone DTs because it performed significantly better than backward selection. Numeric attributes in the training data (including those to be modeled by NB) Copyright c © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. were discretized using MDL-based discretization (Fayyad & Irani 1993), with intervals learned from the training data. The algorithm for learning the combined model (DTNB) proceeds in much the same way as the one for stand-alone DTs. At each point in the search it evaluates the merit associated with splitting the attributes into two disjoint subsets: one for the DT, the other for NB. We use a forward selection, where, at each step, selected attributes are modeled by NB and the remainder by the DT, and all attributes are modeled by the DT initally. Leave-one-out cross-validated AUC is used to evaluate the quality of a split based on the probability estimates generated by the combined model. Note that AUC can easily be replaced by other performance measures. We chose AUC to enable a fair comparison to NB (and hence only used two-class datasets in our experiments). AUC was also used to select attributes for the stand-alone DT. The class probability estimates of the DT and NB must be combined to generate overall class probability estimates. Assuming X> is the set of attributes in the DT and X⊥ the one in NB, the overall class probability is computed as Q(y|X) = α×QDT (y|X>) ×QNB(y|X)/Q(y), whereQDT (y|X>) andQNB(y|X) are the class probability estimates obtained from the DT and NB respectively, α is a normalization constant, andQ(y) is the prior probability of the class. All probabilities are estimated using Laplacecorrected observed counts. In addition to the method described above, we also consider a variant that includes attribute selection, which can discard attributes entirely from the combined model. To this end, in each step of the forward selection, an attribute can be discarded rather than added to the NB model. In the experiments we compare this technique to NB with the same wrapper-based forward selection (also guided by AUC). Empirical Results Table 1 compares DTNB to NB and DTs on 35 UCI datasets. Multi-class datasets were converted into two-class datasets by merging all classes except the largest one. We performed 50 runs of the repeated holdout method, setting aside 66% of the data for training and the rest for testing, and report the mean AUC and standard deviation. Identical runs were used for each algorithm. We used the corrected resampled t-test (Nadeau & Bengio 2003) at the 5% level. Proceedings of the Twenty-First International FLAIRS Conference (2008)
منابع مشابه
Simple decision forests for multi-relational classification
An important task in multi-relational data mining is link-based classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the e↵ect that information from di↵erent data tables is independent given the c...
متن کاملGuideline generation from data by induction of decision tables using a Bayesian network framework
Decision tables can be used to represent practice guidelines effectively. In this study we adopt the powerful probabilistic framework of Bayesian Networks (BN) for the induction of decision tables. We discuss the simplest BN model, the Naive Bayes and extend it to the Two-Stage Naive Bayes. We show that reversal of edges in Naive Bayes and Two-stage Naive Bayes results in simple decision table ...
متن کاملUsing a Hierarchical Bayesian Model to Handle High Cardinality Attributes with Relevant Interactions in a Classification Problem
We employed a multilevel hierarchical Bayesian model in the task of exploiting relevant interactions among high cardinality attributes in a classification problem without overfitting. With this model, we calculate posterior class probabilities for a pattern W combining the observations of W in the training set with prior class probabilities that are obtained recursively from the observations of...
متن کاملDiagnosis of Pulmonary Tuberculosis Using Artificial Intelligence (Naive Bayes Algorithm)
Background and Aim: Despite the implementation of effective preventive and therapeutic programs, no significant success has been achieved in the reduction of tuberculosis. One of the reasons is the delay in diagnosis. Therefore, the creation of a diagnostic aid system can help to diagnose early Tuberculosis. The purpose of this research was to evaluate the role of the Naive Bayes algorithm as a...
متن کاملComparing Machine Learning Approaches for Context-Aware Composition
Context-Aware Composition allows to automatically select optimal variants of algorithms, data-structures, and schedules at runtime using generalized dynamic Dispatch Tables. These tables grow exponentially with the number of significant context attributes. To make ContextAware Composition scale, we suggest four alternative implementations to Dispatch Tables, all well-known in the field of machi...
متن کاملHybrid Bayesian Estimation Trees Based on Label Semantics
Linguistic decision tree (LDT) [7] is a classification model based on a random set based semantics which is referred to as label semantics [4]. Each branch of a trained LDT is associated with a probability distribution over classes. In this paper, two hybrid learning models by combining linguistic decision tree and fuzzy Naive Bayes classifier are proposed. In the first model, an unlabelled ins...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008